Sarcasm Detection on Czech and English Twitter

نویسندگان

  • Tomás Ptácek
  • Ivan Habernal
  • Jun Hong
چکیده

This paper presents a machine learning approach to sarcasm detection on Twitter in two languages – English and Czech. Although there has been some research in sarcasm detection in languages other than English (e.g., Dutch, Italian, and Brazilian Portuguese), our work is the first attempt at sarcasm detection in the Czech language. We created a large Czech Twitter corpus consisting of 7,000 manually-labeled tweets and provide it to the community. We evaluate two classifiers with various combinations of features on both the Czech and English datasets. Furthermore, we tackle the issues of rich Czech morphology by examining different preprocessing techniques. Experiments show that our language-independent approach significantly outperforms adapted state-of-the-art methods in English (F-measure 0.947) and also represents a strong baseline for further research in Czech (F-measure 0.582).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Putting Sarcasm Detection into Context: The Effects of Class Imbalance and Manual Labelling on Supervised Machine Classification of Twitter Conversations

Sarcasm can radically alter or invert a phrase’s meaning. Sarcasm detection can therefore help improve natural language processing (NLP) tasks. The majority of prior research has modeled sarcasm detection as classification, with two important limitations: 1. Balanced datasets, when sarcasm is actually rather rare. 2. Using Twitter users’ self-declarations in the form of hashtags to label data, ...

متن کامل

Tweet Sarcasm: Mechanism of Sarcasm Detection in Twitter

The www is growing at an alarming rate . Users have started participating actively on Internet by giving their opinions on products, services and blogs. Study and analysis of such opinions is known as Opinion Mining. But sometimes users prefer being sarcastic. Sarcasm is a linguistic phenomenon in which people state the opposite of what they actually mean. Sarcasm Detection is a challenging tas...

متن کامل

Contextualized Sarcasm Detection on Twitter

Sarcasm requires some shared knowledge between speaker and audience; it is a profoundly contextual phenomenon. Most computational approaches to sarcasm detection, however, treat it as a purely linguistic matter, using information such as lexical cues and their corresponding sentiment as predictive features. We show that by including extra-linguistic information from the context of an utterance ...

متن کامل

Magnets for Sarcasm: Making Sarcasm Detection Timely, Contextual and Very Personal

Sarcasm is a pervasive phenomenon in social media, permitting the concise communication of meaning, affect and attitude. Concision requires wit to produce and wit to understand, which demands from each party knowledge of norms, context and a speaker’s mindset. Insight into a speaker’s psychological profile at the time of production is a valuable source of context for sarcasm detection. Using a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014